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Abstract. In self-organizing networks, topology and dynamics coevolve 
in a continuous feedback, without exogenous driving. The World Trade 
' f-H '■ Network (WTN) is one of the few empirically well documented examples 

f») ' of self-organizing networks: its topology strongly depends on the GDP 

of world countries, which in turn depends on the structure of trade. 
Therefore, understanding which are the key topological properties of the 
WTN that deviate from randomness provides direct empirical informa- 
tion about the structural effects of self-organization. Here, using an an- 
alytical pattern-detection method that we have recently proposed, we 
study the occurrence of triadic 'motifs' (subgraphs of three vertices) in 
the WTN between 1950 and 2000. We find that, unlike other properties, 
motifs are not explained by only the in- and out-degree sequences. By 
^ ' contrast, they are completely explained if also the numbers of reciprocal 

edges are taken into account. This implies that the self-organization pro- 
cess underlying the evolution of the WTN is almost completely encoded 
into the dyadic structure, which strongly depends on reciprocity. 

1 Introduction 

The global economy is a prototypic example of complex self-organizing system, 
whose collective properties emerge spontaneously through many local interac- 
tions. In particular, international trade between countries defines a complex 
network which arises as the combination of many independent choices of firms. 
It was shown that the topology of the World Trade Network (WTN) strongly 
depends on the Gross Domestic Product (GDP) of world countries pQ. On the 
other hand, the GDP depends on international trade by definition [2J, which 
implies that the WTN is a remarkably well documented example of adaptive 
network, where dynamics and topology coevolve in a continuous feedback. In 
general, understanding self-organizing networks is a major challenge for science, 
as only few models of such networks are analytically solvable (3J. However, in the 
particular case of the WTN, the binary topology of the network is found to be ex- 
tremely well reproduced by a null model which incorporates the degree sequence 
These results, which have been obtained using a fast network randomization 
method that we have recently proposed [5] , make the WTN particularly interest- 
ing. In this paper, after briefly reviewing our randomization method, we apply it 
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to study the occurrence of triadic 'motifs', i.e. directed patterns involving three 
vertices (see FigJT]). We show that, unlike other properties which have been stud- 
ied elsewhere [4], the occurrence of motifs is not explained by only the in- and 
out-degrees of vertices. However, if also the numbers of reciprocal links of each 
vertex (the reciprocal degree sequence) are taken into account, the occurrences 
of triadic motifs are almost completely reproduced. This implies that, if local 
information is enhanced in order to take into account the reciprocity structure, 
motifs display no significant deviations from random expectations. Therefore the 
(in principle complicated) self-organization process underlying the evolution of 
the WTN turns out to be relatively simply encoded into the local dyadic struc- 
ture, which separately specifies the number of reciprocated and non-reciprocated 
links per vertex. Thus the dyadic structure appears to carry a large amount of 
information about the system. 

2 Searching for Non-Random Patterns in the WTN 

In this section we briefly summarize our recently proposed randomization method 
and how it can be used to detect patterns when local constraints are considered. 

2.1 Network Pattern Detection: the Randomization Method 

Our method, which is based on the maximum-likelihood estimation of maximum- 
entropy models of graphs, introduces a family of null models of a real network 
and uses it to detect topological patterns analytically [5]. Defining a null model 
means setting up a method to assign probabilites |6l7l8j . In our approach, a real 
network G* with TV vertices is given (either a binary or a weighted graph, and 
either directed or undirected, whose generic entry is gij) and a way to generate 
a family Q of randomized variants of G* is provided, by assigning each graph 
Ge§a probability -P(G). In the method, the probabilities -P(G) are such that 
a maximally random ensemble of networks is generated, under the constraint 
that, on average, a set {C a } of desired topological properties is set equal to the 
values {C a (G*)} observed in the real network G*. This is achieved as the result 
of a constrained Shannon-Gibbs entropy maximization [8] 



G 

and the imposed constraints are the normalization of the probability and the 
average values of a number of chosen properties, {C a }' 
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This optimization leads to exponential probability coefficients [8] 
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where the linear combination of the contraints H(G, 9) = ^aC a (G) is called 
graph hamiltonian (the coefficients {8 a } are free parameters, acting as Lagrange 
multipliers controlling the expected values {(C a )}) and the denominator Z = 
^2 G e~ H ( G ' e ^ is called partition function. 

The next step is the maximization of the probability P(G*) to obtain the 
observed graph G*, i.e. the real- world network to randomize [5]. This step fixes 
the values of the Lagrange multipliers as they are found by maximizing the 
log-likelihood 

C{6) ee In P(G* \8) = -if (G*, 9) - In Z{6) (4) 

to obtain the real network G*. It can be easily verified [S] that this is achieved 
by the parameter values 0* satisfying 

(C o >*=^C o (G)P(G|0*) = C o (G*) Va (5) 

G 

that is, that the ensemble average of each constraint, (C a ), equals the observed 
value on the real network, C a (G*). Once the numerical values of the Lagrange 
multipliers are found, they can be used to find the ensemble average (X)* of any 
topological property X of interest [5] : 

(X)* =J2 x ( G ) p ( G \ 9 *) ■ ( 6 ) 

G 

The exact computation of the expected values can be very difficult. For this 
reason it is often necessary to rest on the linear approximation method [5] . How- 
ever, in the present study we will consider particular topological properties X 
(i.e. motif counts, see below) whose expected value can be evaluated exactly. Our 
method also allows to obtain the variance of X by applying the usual definition: 

o->[X} = ([X(G) (X))] 2 ) = ££*'<ft a ] f|^|^) (7) 

i,j t,s \°9ij °9tsJ G=(G> 

where a[gij,g ts ] is the covariance of the adjacency matrix elements gij and gt s . 
This formula can be greatly simplified by considering probabilities that are fac- 
torizable in terms of dyadic probabilities, as follows 

l\GO: \[l),A ! i,,. !h ,0' (8) 

i<j 

where the product runs over all the dyads, that is the unordered pairs of vertices 
(i,j) (with i < j), and Dij{g,g'\6) is the joint probability that — g and 
9ji = g' ■ Finally, the variance of X evaluated in 0* becomes 



i,3 



(9) 
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The joint knowledge of (X)* and cr*[X] allows to detect deviations from ran- 
domness in the observed topology. In particular, as we show later, it is possible 
to calculate by how many standard deviations the observed value X* differs 
from the expected value (X)* . Quantities which are consistent with their ex- 
pected value are explained by the enforced constraints {C a }- On the other hand, 
significantly deviating properies cannot be traced back to the constraints and 
therefore signal the incompleteness of the information encoded in the constraints. 
Other approaches achieve this result by explicitly generating many randomized 
variants of the real network, measuring X on each such variant, and finally com- 
puting the sample average and standard deviation of X [6]. This is extremely 
time consuming, especially for complicated topological properties. By contrast, 
our method is entirely analytical. It yields any expected quantity (X)* in a time 
as short as that required in order to measure X* on the single network G* [5]. 

2.2 The Role of Local Constraints 

If the network is a binary graph (i.e. if each graph G in the ensemble is uniquely 
specified by its adjacency matrix A), then the simplest (i.e. local) choice of 
the constraints is the degree sequence, i.e. the vector of degrees (numbers of in- 
cident links) of all vertices. For directed networks, which are our interest here, 
there are actually two degree sequences: the observed in-degree sequence k ln (A*) 
(with k\ n = J2j^i a ji) an d the observed out-degree sequence k out (A*) (with 
k° nt = ^2j^£i a ij)- This null model, which is known as the directed configura- 
tion model (DCM), can be completely dealt with analytically using our method 
(see Appendix). When applied to the WTN, the DCM shows that many topo- 
logical properties (such as the degree-degree correlations and the directed clus- 
tering coefficients) are in complete accordance with the expectations [1]. This 
shows that the degree sequences k m and k out are extremely informative, as their 
(partial) knowledge allows to reconstruct many aspects of the (complete) topol- 
ogy. On the other hand, it was also shown that the reciprocity of the WTN 
is highly non-trivial [10]. This means that the occurrence of reciprocal links is 
much higher than expected under any model which, as the DCM, treats two 
reciprocal links (e.g. i — > j and j — > i) as statistically independent. A direct 
consequence is that the reciprocity, as well as any higher-order directional pat- 
tern, should not be reproduced by the DCM. These seemingly conflicting results 
can only be reconciled if, for some reason, the topological properties that have 
been studied under the DCM [4] mask the effects of reciprocity. In particular, 
the directed clustering coefficients, which are based on ratios of realized triangles 
over the maximum number for each vertex, may show no overall deviation from 
the DCM, even if the numerator and denominator separately deviate from it. 
In what follows, we investigate this possibility by considering all the observed 
subgraphs of three vertices (which include both open and closed triangles) sep- 
arately. Also, we will use an additional null model which also takes the number 
of reciprocal links of each vertex into account. This second null model is the 
reciprocal configuration model (RCM) |5ll0lllj . The local constraints defining it 
are the three, observed directed-degree sequences k^(A*), with = Y^j^i a ij 
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(non-reciprocated out degree) , k'* - (A* ) , with = a>tj (non-reciprocated 

in-degree) and k <_i '(A*)}, with k^ = J2j^i a tj (reciprocated degree) to be im- 
posed across the ensemble of networks having the same number of vertices of 
the observed configuration and, on average, the above-mentioned directed-degree 
sequences. In the Appendix we describe both the DCM and the RCM in more 
detail, and derive their expectations explicitly. 



2.3 Triadic Motifs in the WTN 

In the following analyses, we use yearly bilateral data on exports and imports 
from the Gleditsch Databas^] to analyse the six years 1950, 1960, 1970, 1980, 
1990, 2000. This database contains aggregated trade data between countries, i.e. 
data as they result by summing the single commodity-specific trade exchanges. 
So we end up with six different, real, asymmetric matrices with entries m^ s (y) 
(y = 1950, 1960 . . . 2000). These adjacency matrix elements are the fundamental 
data allowing us to obtain all the possible representations of the WTN: to build 
the binary, directed representation we are interested in here, we restrict ourselves 
to consider two different vertices as linked, whenever the corresponding clement 
m ij S (y) 1S strictly positive. This implies that the adjacency matrix of the binary, 
directed representation of the WTN in year y is simply obtained by applying 
the Heaviside step function to the database entries, i.e. a-y(y) = <9[m^ ss (y)]. 

Triadic motifs, i.e. all the possibile directed patterns connecting three ver- 
tices, are the natural generalizations of directed clustering coefficients |TS] and 
the starting point for the understanding of a complex network's self-organization 
in communities. Thirteen, non-isomorphic, triadic directed patterns can be in- 
demnified and classified [131 . Given a real, binary, directed matrix A* , the motifs 
occurrences N m can be written in at least two different ways (see Table Q}. The 
first one prescribes to define them in terms of the adjacency matrix entries, {ctij}. 
The second one allows to compactly express the coefficients N m by introducing 
the following dyadic variables 



= aij(l-aji), a^J = a,j(l - a^), = a^a^, a% = (1 - ay)(l - a,-;) (10) 

thus making the role of reciprocity explicit. However, the number of occurrences 
of the particular motif m (where m ranges from 1 to 13) as measured on the 
observed network A* is uninformative unless a comparison with a properly de- 
fined null model is made (see Appendix). This implies that the occurrence of a 
motif should be compared with its expected value (N m )* , as computed under the 
chosen null model. This can be compactly achieved by introducing the so-called 
z-score 

N m (A*)-(N m )* 

z[Nm] = — *m — 



Gleditsch, K.S.: Expanded trade and GDP data. Jour. Conn. Res. 46 (2002) 712-724. 
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Fig. 1. The triadic, binary, directed motifs 



Table 1. Classification (after [T3]) and definitions of the triadic motifs 


Motif m N m : 1 st definition 




N m : 2 nd definition 


1 


Ei^fcC 1 - aij)ajiaj k (l - a kj )(l - a ik ){l - 


a k 


) Ei^j'^fc a ij a jk a ik 


2 


E l7 ^fc ~ aji)a jk (l - a kj ){\ - a ik ){l - 


(ik 


i) Yi^j 7 t k a ij a jk a ik 


3 


Ei^fc aijajia jk {l - a kj )(l - a ik )(l - a ki ) 




<-> — > 
^i^j^k a ij a jk a ik 


4 


E^.,vfc( 1 - aij){l - a Jl )a jk (l - a k] )a xk {\ - 


a k 


) Ei^,^fe a ij a jk a ik 


5 


E l ^.,vfc( 1 - aij)a,jia jk (l - a fej )a 4fe (l - a fci ) 




\ -* — > — > 
Z^i^j^k a ij a ]k a ik 


6 


E^j/fc aijajiajfe(l - a kj )a ik (l - a ki ) 




»J a jk a ik 


7 


E^.,^fc aijCiji(l - a jk )a k j(l - a lfc )(l - a ki ) 




v~~* <-> «— *^ 
l^i^j^k a ij a jk a ik 


8 


E l7 y^fc anajiajkakj(l - a ik )(l - a ki ) 




Ei^j^fc a ij a jk a ik 


9 


E l ^. ) ^fc( 1 ~ aij)aji(l - a jk )a kj a ik (l - a ki ) 




l^i^j^k a ij a jk a ik 


10 


E l ^ ;) ^fc( 1 - aij)a iaj k a k ja ik (l - a ki ) 




Ei^j^fc a ij a jk a ik 


11 


E l7 y^fc "-iji 1 - a]i)a jk a kj a ik (l - a ki ) 




Ei^j^fc a ij a jk a ik 


12 


Ei^j^fc o,ijajiaj k a k jai k (l — a ki ) 




Ei^j^fc a ij a jk a ik 


13 


y \i-£j-£ k CljiCLj k Cl k j CLi k Cl k i 







measuring by how many standard deviations, a*[N m ], the observed and the 
expected occurrences of motif m differ. Large absolute values of z[iV m ] indicate 
motifs that are either over- or under-represented under the particular null model 
considered and therefore not explained by the constraints defining it, as shown 
in Fig|3]and FigJ3]and discussed in the next section. 

3 Results and Discussion 

Fig[5]and Fig|3]show the z-scores for all the 13, triadic, binary, directed motifs, 
computed for the six different snapshots of the World Trade Network correspond- 
ing to the decades 1950, 1960, 1970, 1980, 1990, 2000, for both the DCM and 
the RCM. We also show the six lines z = ±1, z = ±2 and z = ±3 to highlight 
the region within 3 sigmas from the expectation value. The analysis reveals a 
dramatic difference between the predictions of the two null models. The pres- 
ence of intrinsically directed trading relationships implies that reciprocity is a 
fundamental quantity, shaping the network of exchanges among world-countries. 

The reciprocity of the WTN is known to be very high [TO] and this has 
strong effects on its motif structure. This implies that, in order to reproduce 
the topology of the network, it is essential to reproduce its dyadic structure, 
which in turn strongly depends on reciprocity. This also implies that the RCM 
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Fig. 2. z-scores of the 13 triadic, binary, directed motifs for the six decades of the 
WTN, under the DCM (■) and under the RCM (•). The dashed, red lines represent 
the values z — ±3, the dotted, purple lines the values z = ±2 and the dot-dashed, pink 
lines the values z = ±1. 



should fluctuate less than the DCM, because of this huge amount of additional 
information. Indeed, this is confirmed by our analysis. 

In fact, we find that, unlike other topological properties [1], triadic motifs are 
systematically not reproduced if only the in-degree and out-degree sequences are 
taken into account. In particular, the observed motifs counts deviate by up to 100 
standard deviations from the expected ones. By contrast, if also the reciprocity 
(which separately specifies the number of reciprocated links) is introduced in the 
null model, triadic motifs are almost always consistent with expectations within 
one standard deviation, as for m = 5, 10, 11, 12 or at most two standard devi- 
ations, as 771 — 1,4. Moreover, the significance profiles are almost completely 
inverted: some of the motifs being over(under)-represented without the reci- 
procity constraint, as m = 8 (m = 2), become under (over)-represented with the 
reciprocity constraint. 
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Fig. 3. z-scores of the 13 triadic, binary, directed motifs for the six decades of the 
WTN, under the DCM (■) and under the RCM (•): zoom of Fig. [2] The dashed, red 
lines represent the values z = ±3, the dotted, purple lines the values z = ±2 and the 
dot-dashed, pink lines the values z = ±1. 



4 Conclusions 



The WTN is a particularly interesting network which is known to be driven by a 
self-organization process involving the global economy. Empirically, it is a very 
well documented network which allows to test predictions of null models about 
the key ingredients shaping its topology. In this paper, we aimed at isolating the 
key properties of the WTN topology where the effects of self-organizations can 
be clearly detected as deviations from randomness. While the DCM, which takes 
into account the number of incoming and outgoing connections of all vertices, 
reproduces many other topological properties, it cannot replicate the observed 
triadic motifs. On the other hand, we found that the RCM, which also takes 
into account the numbers of reciprocated links, can replicate almost perfectly 
the triadic structure. We therefore found that the process underlying the evolu- 
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tion of the WTN is mainly encoded into the dyadic structure, which carries a 
large amount of information about the system: so, the upgrade to the RCM is 
necessary and the possibility to treat the RCM analytically using our method is 
therefore an important step forward. The result that dyadic properties almost 
completely explain triadic ones suggests that in the WTN also higher-order prop- 
erties (e.g. subgraphs of four or more vertices, and in general even the existence 
of denser communities of vertices) will be explained mostly by dyadic properties. 
This raises the important question whether the available community detection 
methods are successful in identifying communities which are not explained by 
local constraints. In particular, our results suggest that modularity-based com- 
munity detection methods will detect communities in the WTN if a null model 
without reciprocity (DCM) is used, while they should find weak or no community 
structure if a null model including reciprocity (RCM) is used. In both cases, as 
we have already claimed [3] , the available expressions for the modularity (which 
make the strong assumption of sparse networks) should be revised using the cor- 
rect expectation values provided by our method, since the WTN is a very dense 
network. We will address these issues in future work. 
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5 Appendix 

5.1 The Directed Configuration Model 

Given a real, binary, directed network A* with out-degree and in-degree se- 
quences k out (A*) and k ln (A*), the method described in section [Oil can be spec- 
ified in the following way. 

The DCM Hamiltonian. The Hamiltonian implementing the DCM is 



the partition function can be calculated as in [8], Z(a,(3) = J2 A e H ( A ' a -0) = 
rL^i (l + e _Qi_/3j ), and the graph probability is 



P(A\a, /3) = J] AiK', a^K a,-, = JJp« i 1 ~ ^) (1 ~ a>j) (13) 



ff(A,a,/3) = ^[aifc^A) + A*f(A)] = 



OH + pj)a. 



(12) 




i<j 



and, by setting Xi = e 



and tji = e 




Pij = 



(14) 



1 + e" Qi 



:i ft 1 + Xiyj 
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The Log- Likelihood Equations. The log-likelihood function to maximize is 

£(x,y) =J2 [fc° ut (A*)l na ; 4 + fcr(A*)l n y 4 ] -^ln(l + ^) (15) 

i i^j 

and the values x* , y* corresponding to the point of maximum can be found by 
solving the following system 



Expectation Values and Variances. The expectation value and variance of ay- 
arc, respectively (a„)* = p^ and (cr*^]) 2 = (ay)*(l - (a^)*) = 
Distinct pairs of vertices are independent random variables; since the first defi- 
nition of the motifs occurrences only involves products of the adjacency matrix 
entries, their expectation values can be easily computed, as shown in Tabic [2j 
The variance of N m becomes 



{«*[Nm]? =Y<\( 
considering that a*[a,ij,aji\ = (aijOji)* — (cty)* (a-ji)* = 



(17) 



5.2 The Reciprocal Configuration Model 

Given a real binary directed network A* with directed-degree sequences k _ *'(A*), 
k^(A*) and k^(A*) where 

k? ( A* ) ee £ o* ( 1 - 4 ) , Kt { A* ) = £ a% (1 - a*, ) , fcf ( A* ) = £ 44 

jy» jV* 

(18) 

the randomization procedure can be specified in the following way. 
The RCM Hamiltonian. The Hamiltonian implementing the RCM is 

tf(A,a,/3, 7 ) =^[^(A)+ftfcT(A)+7i*r(A)] , (19) 

the partition function can be calculated as in [TT|, Z(a,(3,j) = Yli<j(^ + 
e -a.i-Pj _|_ g-aj-ft _|_ anc j the graph probability is 



P(A|a, /3, 7) = J| Dij (a tj , (a*, ay , ft, ft , 7i, 7j) 

= n^) a - (?« (^)° i<j «) a * 7 ( 2 °) 
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and, by setting Xj = e Qi , ?/, = e " and z, = e 7i [IT] 



P« 55 TT 7^ 7 » f« = 77 (21) 

^ ^ _ 1 

1 + Xiyj + x j y l + ZiZj J 1 + Xiyj + Xji/i + ZiZ 3 



The Log- Likelihood Equations. The log-likelihood function to maximize is 
£(x, y , z) = £ [fc7* ( A* ) In a* + (A* ) In Vl + (A* ) In z. t ] 

i 

- / ] + XjUj + Xjyi + ZjZj) (23) 

and the values x*, y*, z* corresponding to the point of maximum can be found 
by solving the following system 



Expectation Values and Variances. The expectation value and variance of 
(and equivalently for the other dyadic variables) are, respectively, (a~*)* = (plj)* 
and {cj*[a^]f = (a^)*(l - (a^)*) = (p£)*(l - (p£)*). Considering that two 
distinct dyads can be treated as independent random variables and that the 
second definition of the motifs occurrences only involves products of dyads, their 
expectation values can be easily computed, as shown in Table [2j The variance 
of N m becomes 



* r ,dN n y fdN m dN rt 



® a ij J A=(A)* \ddij daji J A= ( A }* 

(25) 

where now (<7*[e%]) 2 = (a£ +a^>*(l - (a£ + and <r*[ay, a,-*] = <a£)* - 
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£ i5 ^ fc (i - p«)(! - Pa)PjkO- - Pkj)pik(i - 


Phi) 'Ei^kPilPjkPik 
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10 
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T.i^kPvPjiPokPkjPikil ~ Pfci) 
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13 


E^,Vfc PijPjiPjkPkjPikPki 


Tli^i^k Pi] PjkPik 
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